27. Identify the Most Powerful Features
Identify the Most Powerful Features
Question:
Take your (overfit) decision tree and use the
feature_importances_
attribute to get a list of the relative importance of all the features being used. We suggest iterating through this list (it’s long, since this is text data) and only printing out the feature importance if it’s above some threshold (say, 0.2--remember, if all words were equally important, each one would give an importance of far less than 0.01).
What’s the importance of the most important feature? What is the number of this feature?
Start Quiz:

INSTRUCTOR NOTE:
Special Note: Depending on when you downloaded the code provided for
find_signature.py
, you may need to change the code in lines 9-10 to be
words_file = "../text_learning/your_word_data.pkl"
authors_file = "../text_learning/your_email_authors.pkl"
so that the files created from running
vectorize_text.py
are reflected properly.